Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic: orientation problem

Task-Agnostic Semantic Communications Relying on Information Bottleneck and Federated Meta-Learning

Apr 30, 2025

Hao Wei, Wen Wang, Wanli Ni, Wenjun Xu, Yongming Huang, Dusit Niyato, Ping Zhang

Abstract:As a paradigm shift towards pervasive intelligence, semantic communication (SemCom) has shown great potentials to improve communication efficiency and provide user-centric services by delivering task-oriented semantic meanings. However, the exponential growth in connected devices, data volumes, and communication demands presents significant challenges for practical SemCom design, particularly in resource-constrained wireless networks. In this work, we first propose a task-agnostic SemCom (TASC) framework that can handle diverse tasks with multiple modalities. Aiming to explore the interplay between communications and intelligent tasks from the information-theoretical perspective, we leverage information bottleneck (IB) theory and propose a distributed multimodal IB (DMIB) principle to learn minimal and sufficient unimodal and multimodal information effectively by discarding redundancy while preserving task-related information. To further reduce the communication overhead, we develop an adaptive semantic feature transmission method under dynamic channel conditions. Then, TASC is trained based on federated meta-learning (FML) for rapid adaptation and generalization in wireless networks. To gain deep insights, we rigorously conduct theoretical analysis and devise resource management to accelerate convergence while minimizing the training latency and energy consumption. Moreover, we develop a joint user selection and resource allocation algorithm to address the non-convex problem with theoretical guarantees. Extensive simulation results validate the effectiveness and superiority of the proposed TASC compared to baselines.

Via

Access Paper or Ask Questions

Reinforcement Learning-Based Heterogeneous Multi-Task Optimization in Semantic Broadcast Communications

Apr 28, 2025

Zhilin Lu, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

Abstract:Semantic broadcast communications (Semantic BC) for image transmission have achieved significant performance gains for single-task scenarios. Nevertheless, extending these methods to multi-task scenarios remains challenging, as different tasks typically require distinct objective functions, leading to potential conflicts within the shared encoder. In this paper, we propose a tri-level reinforcement learning (RL)-based multi-task Semantic BC framework, termed SemanticBC-TriRL, which effectively resolves such conflicts and enables the simultaneous support of multiple downstream tasks at the receiver side, including image classification and content reconstruction tasks. Specifically, the proposed framework employs a bottom-up tri-level alternating learning strategy, formulated as a constrained multi-objective optimization problem. At the first level, task-specific decoders are locally optimized using supervised learning. At the second level, the shared encoder is updated via proximal policy optimization (PPO), guided by task-oriented rewards. At the third level, a multi-gradient aggregation-based task weighting module adaptively adjusts task priorities and steers the encoder optimization. Through this hierarchical learning process, the encoder and decoders are alternately trained, and the three levels are cohesively integrated via constrained learning objective. Besides, the convergence of SemanticBC-TriRL is also theoretically established. Extensive simulation results demonstrate the superior performance of the proposed framework across diverse channel conditions, particularly in low SNR regimes, and confirm its scalability with increasing numbers of receivers.

Via

Access Paper or Ask Questions

Two-Agent DRL for Power Allocation and IRS Orientation in Dynamic NOMA-based OWC Networks

Apr 26, 2025

Ahrar N. Hamad, Ahmad Adnan Qidan, Taisir E. H. El-Gorashi, Jaafar M. H. Elmirghani

Abstract:Intelligent reflecting surfaces (IRSs) technology has been considered a promising solution in visible light communication (VLC) systems due to its potential to overcome the line-of-sight (LoS) blockage issue and enhance coverage. Moreover, integrating IRS with a downlink non-orthogonal multiple access (NOMA) transmission technique for multi-users is a smart solution to achieve a high sum rate and improve system performance. In this paper, a dynamic IRS-assisted NOMA-VLC system is modeled, and an optimization problem is formulated to maximize sum energy efficiency (SEE) and fairness among multiple mobile users under power allocation and IRS mirror orientation constraints. Due to the non-convex nature of the optimization problem and the non-linearity of the constraints, conventional optimization methods are impractical for real-time solutions. Therefore, a two-agent deep reinforcement learning (DRL) algorithm is designed for optimizing power allocation and IRS orientation based on centralized training with decentralized execution to obtain fast and real-time solutions in dynamic environments. The results show the superior performance of the proposed DRL algorithm compared to standard DRL algorithms typically used for resource allocation in wireless communication. The results also show that the proposed DRL algorithm achieves higher performance compared to deployments without IRS and with randomly oriented IRS elements.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps

Apr 25, 2025

Simon Hakenes, Tobias Glasmachers

Abstract:This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

Terrain-Aware Kinodynamic Planning with Efficiently Adaptive State Lattices for Mobile Robot Navigation in Off-Road Environments

Apr 24, 2025

Eric R. Damm, Jason M. Gregory, Eli S. Lancaster, Felix A. Sanchez, Daniel M. Sahu, Thomas M. Howard

Abstract:To safely traverse non-flat terrain, robots must account for the influence of terrain shape in their planned motions. Terrain-aware motion planners use an estimate of the vehicle roll and pitch as a function of pose, vehicle suspension, and ground elevation map to weigh the cost of edges in the search space. Encoding such information in a traditional two-dimensional cost map is limiting because it is unable to capture the influence of orientation on the roll and pitch estimates from sloped terrain. The research presented herein addresses this problem by encoding kinodynamic information in the edges of a recombinant motion planning search space based on the Efficiently Adaptive State Lattice (EASL). This approach, which we describe as a Kinodynamic Efficiently Adaptive State Lattice (KEASL), differs from the prior representation in two ways. First, this method uses a novel encoding of velocity and acceleration constraints and vehicle direction at expanded nodes in the motion planning graph. Second, this approach describes additional steps for evaluating the roll, pitch, constraints, and velocities associated with poses along each edge during search in a manner that still enables the graph to remain recombinant. Velocities are computed using an iterative bidirectional method using Eulerian integration that more accurately estimates the duration of edges that are subject to terrain-dependent velocity limits. Real-world experiments on a Clearpath Robotics Warthog Unmanned Ground Vehicle were performed in a non-flat, unstructured environment. Results from 2093 planning queries from these experiments showed that KEASL provided a more efficient route than EASL in 83.72% of cases when EASL plans were adjusted to satisfy terrain-dependent velocity constraints. An analysis of relative runtimes and differences between planned routes is additionally presented.

* 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 9918-9925
* 8 page paper with 1 additional copyright page. Published at the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Via

Access Paper or Ask Questions

Towards Equitable Rail Service Allocation Through Fairness-Oriented Timetabling in Liberalized Markets

Apr 24, 2025

David Muñoz-Valero, Juan Moreno-Garcia, Julio Alberto López-Gómez, Enrique Adrian Villarrubia-Martin

Abstract:Over the last few decades, European rail transport has undergone major changes as part of the process of liberalization set out in European regulations. In this context of liberalization, railway undertakings compete with each other for the limited infrastructure capacity available to offer their rail services. The infrastructure manager is responsible for the equitable allocation of infrastructure between all companies in the market, which is essential to ensure the efficiency and sustainability of this competitive ecosystem. In this paper, a methodology based on Jain, Gini and Atkinson equity metrics is used to solve the rail service allocation problem in a liberalized railway market, analyzing the solutions obtained. The results show that the proposed methodology and the equity metrics used allow for equitable planning in different competitiveness scenarios. These results contrast with solutions where the objective of the infrastructure manager is to maximize its own profit, without regard for the equitable allocation of infrastructure. Therefore, the computational tests support the methodology and metrics used as a planning and decision support tool in a liberalized railway market.

* 30 pages, 7 figures

Via

Access Paper or Ask Questions

Geometric Formulation of Unified Force-Impedance Control on SE(3) for Robotic Manipulators

Apr 23, 2025

Joohwan Seo, Nikhil Potu Surya Prakash, Soomi Lee, Arvind Kruthiventy, Megan Teng, Jongeun Choi, Roberto Horowitz

Abstract:In this paper, we present an impedance control framework on the SE(3) manifold, which enables force tracking while guaranteeing passivity. Building upon the unified force-impedance control (UFIC) and our previous work on geometric impedance control (GIC), we develop the geometric unified force impedance control (GUFIC) to account for the SE(3) manifold structure in the controller formulation using a differential geometric perspective. As in the case of the UFIC, the GUFIC utilizes energy tank augmentation for both force-tracking and impedance control to guarantee the manipulator's passivity relative to external forces. This ensures that the end effector maintains safe contact interaction with uncertain environments and tracks a desired interaction force. Moreover, we resolve a non-causal implementation problem in the UFIC formulation by introducing velocity and force fields. Due to its formulation on SE(3), the proposed GUFIC inherits the desirable SE(3) invariance and equivariance properties of the GIC, which helps increase sample efficiency in machine learning applications where a learning algorithm is incorporated into the control law. The proposed control law is validated in a simulation environment under scenarios requiring tracking an SE(3) trajectory, incorporating both position and orientation, while exerting a force on a surface. The codes are available at https://github.com/Joohwan-Seo/GUFIC_mujoco.

* Submitted to Control Decision Conference (CDC) 2025

Via

Access Paper or Ask Questions

Planning with Diffusion Models for Target-Oriented Dialogue Systems

Apr 23, 2025

Hanwen Du, Bo Peng, Xia Ning

Abstract:Target-Oriented Dialogue (TOD) remains a significant challenge in the LLM era, where strategic dialogue planning is crucial for directing conversations toward specific targets. However, existing dialogue planning methods generate dialogue plans in a step-by-step sequential manner, and may suffer from compounding errors and myopic actions. To address these limitations, we introduce a novel dialogue planning framework, DiffTOD, which leverages diffusion models to enable non-sequential dialogue planning. DiffTOD formulates dialogue planning as a trajectory generation problem with conditional guidance, and leverages a diffusion language model to estimate the likelihood of the dialogue trajectory. To optimize the dialogue action strategies, DiffTOD introduces three tailored guidance mechanisms for different target types, offering flexible guidance towards diverse TOD targets at test time. Extensive experiments across three diverse TOD settings show that DiffTOD can effectively perform non-myopic lookahead exploration and optimize action strategies over a long horizon through non-sequential dialogue planning, and demonstrates strong flexibility across complex and diverse dialogue scenarios. Our code and data are accessible through https://anonymous.4open.science/r/DiffTOD.

Via

Access Paper or Ask Questions

EmoSEM: Segment and Explain Emotion Stimuli in Visual Art

Apr 22, 2025

Jing Zhang, Dan Guo, Zhangbin Li, Meng Wang

Abstract:This paper focuses on a key challenge in visual art understanding: given an art image, the model pinpoints pixel regions that trigger a specific human emotion, and generates linguistic explanations for the emotional arousal. Despite recent advances in art understanding, pixel-level emotion understanding still faces a dual challenge: first, the subjectivity of emotion makes it difficult for general segmentation models like SAM to adapt to emotion-oriented segmentation tasks; and second, the abstract nature of art expression makes it difficult for captioning models to balance pixel-level semantic understanding and emotion reasoning. To solve the above problems, this paper proposes the Emotion stimuli Segmentation and Explanation Model (EmoSEM) to endow the segmentation model SAM with emotion comprehension capability. First, to enable the model to perform segmentation under the guidance of emotional intent well, we introduce an emotional prompt with a learnable mask token as the conditional input for segmentation decoding. Then, we design an emotion projector to establish the association between emotion and visual features. Next, more importantly, to address emotion-visual stimuli alignment, we develop a lightweight prefix projector, a module that fuses the learned emotional mask with the corresponding emotion into a unified representation compatible with the language model. Finally, we input the joint visual, mask, and emotional tokens into the language model and output the emotional explanations. It ensures that the generated interpretations remain semantically and emotionally coherent with the visual stimuli. The method innovatively realizes end-to-end modeling from low-level pixel features to high-level emotion interpretation, providing the first interpretable fine-grained analysis framework for artistic emotion computing. Extensive experiments validate the effectiveness of our model.

Via

Access Paper or Ask Questions

Haptic-based Complementary Filter for Rigid Body Rotations

Apr 20, 2025

Amit Kumar, Domenico Campolo, Ravi N. Banavar

Abstract:The non-commutative nature of 3D rotations poses well-known challenges in generalizing planar problems to three-dimensional ones, even more so in contact-rich tasks where haptic information (i.e., forces/torques) is involved. In this sense, not all learning-based algorithms that are currently available generalize to 3D orientation estimation. Non-linear filters defined on $\mathbf{\mathbb{SO}(3)}$ are widely used with inertial measurement sensors; however, none of them have been used with haptic measurements. This paper presents a unique complementary filtering framework that interprets the geometric shape of objects in the form of superquadrics, exploits the symmetry of $\mathbf{\mathbb{SO}(3)}$, and uses force and vision sensors as measurements to provide an estimate of orientation. The framework's robustness and almost global stability are substantiated by a set of experiments on a dual-arm robotic setup.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Topic: orientation problem

Papers and Code